Overview

Dataset statistics

Number of variables10
Number of observations20640
Missing cells207
Missing cells (%)0.1%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory1.6 MiB
Average record size in memory80.0 B

Variable types

Numeric9
Categorical1

Warnings

longitude is highly correlated with latitudeHigh correlation
latitude is highly correlated with longitudeHigh correlation
total_rooms is highly correlated with total_bedrooms and 2 other fieldsHigh correlation
total_bedrooms is highly correlated with total_rooms and 2 other fieldsHigh correlation
population is highly correlated with total_rooms and 2 other fieldsHigh correlation
households is highly correlated with total_rooms and 2 other fieldsHigh correlation
median_income is highly correlated with median_house_valueHigh correlation
median_house_value is highly correlated with median_incomeHigh correlation
longitude is highly correlated with latitudeHigh correlation
latitude is highly correlated with longitudeHigh correlation
total_rooms is highly correlated with total_bedrooms and 2 other fieldsHigh correlation
total_bedrooms is highly correlated with total_rooms and 2 other fieldsHigh correlation
population is highly correlated with total_rooms and 2 other fieldsHigh correlation
households is highly correlated with total_rooms and 2 other fieldsHigh correlation
median_income is highly correlated with median_house_valueHigh correlation
median_house_value is highly correlated with median_incomeHigh correlation
longitude is highly correlated with latitudeHigh correlation
latitude is highly correlated with longitudeHigh correlation
total_rooms is highly correlated with total_bedrooms and 2 other fieldsHigh correlation
total_bedrooms is highly correlated with total_rooms and 2 other fieldsHigh correlation
population is highly correlated with total_rooms and 2 other fieldsHigh correlation
households is highly correlated with total_rooms and 2 other fieldsHigh correlation
households is highly correlated with population and 2 other fieldsHigh correlation
median_house_value is highly correlated with longitude and 3 other fieldsHigh correlation
longitude is highly correlated with median_house_value and 2 other fieldsHigh correlation
latitude is highly correlated with median_house_value and 2 other fieldsHigh correlation
median_income is highly correlated with median_house_valueHigh correlation
population is highly correlated with households and 2 other fieldsHigh correlation
total_bedrooms is highly correlated with households and 2 other fieldsHigh correlation
ocean_proximity is highly correlated with median_house_value and 2 other fieldsHigh correlation
total_rooms is highly correlated with households and 2 other fieldsHigh correlation
total_bedrooms has 207 (1.0%) missing values Missing

Reproduction

Analysis started2021-09-01 15:18:03.570469
Analysis finished2021-09-01 15:19:50.383643
Duration1 minute and 46.81 seconds
Software versionpandas-profiling v3.0.0
Download configurationconfig.json

Variables

longitude
Real number (ℝ)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct844
Distinct (%)4.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean-119.5697045
Minimum-124.35
Maximum-114.31
Zeros0
Zeros (%)0.0%
Negative20640
Negative (%)100.0%
Memory size161.4 KiB
2021-09-01T20:49:51.899038image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum-124.35
5-th percentile-122.47
Q1-121.8
median-118.49
Q3-118.01
95-th percentile-117.08
Maximum-114.31
Range10.04
Interquartile range (IQR)3.79

Descriptive statistics

Standard deviation2.003531724
Coefficient of variation (CV)-0.01675618195
Kurtosis-1.330152366
Mean-119.5697045
Median Absolute Deviation (MAD)1.28
Skewness-0.297801208
Sum-2467918.7
Variance4.014139367
MonotonicityNot monotonic
2021-09-01T20:49:52.695871image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
-118.31162
 
0.8%
-118.3160
 
0.8%
-118.29148
 
0.7%
-118.27144
 
0.7%
-118.32142
 
0.7%
-118.28141
 
0.7%
-118.35140
 
0.7%
-118.36138
 
0.7%
-118.19135
 
0.7%
-118.25128
 
0.6%
Other values (834)19202
93.0%
ValueCountFrequency (%)
-124.351
 
< 0.1%
-124.32
 
< 0.1%
-124.271
 
< 0.1%
-124.261
 
< 0.1%
-124.251
 
< 0.1%
-124.233
< 0.1%
-124.221
 
< 0.1%
-124.213
< 0.1%
-124.194
< 0.1%
-124.186
< 0.1%
ValueCountFrequency (%)
-114.311
 
< 0.1%
-114.471
 
< 0.1%
-114.491
 
< 0.1%
-114.551
 
< 0.1%
-114.561
 
< 0.1%
-114.573
< 0.1%
-114.582
< 0.1%
-114.592
< 0.1%
-114.63
< 0.1%
-114.613
< 0.1%

latitude
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct862
Distinct (%)4.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean35.63186143
Minimum32.54
Maximum41.95
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size161.4 KiB
2021-09-01T20:49:53.461459image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum32.54
5-th percentile32.82
Q133.93
median34.26
Q337.71
95-th percentile38.96
Maximum41.95
Range9.41
Interquartile range (IQR)3.78

Descriptive statistics

Standard deviation2.135952397
Coefficient of variation (CV)0.05994501302
Kurtosis-1.117759781
Mean35.63186143
Median Absolute Deviation (MAD)1.23
Skewness0.4659530037
Sum735441.62
Variance4.562292644
MonotonicityNot monotonic
2021-09-01T20:49:54.227044image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
34.06244
 
1.2%
34.05236
 
1.1%
34.08234
 
1.1%
34.07231
 
1.1%
34.04221
 
1.1%
34.09212
 
1.0%
34.02208
 
1.0%
34.1203
 
1.0%
34.03193
 
0.9%
33.93181
 
0.9%
Other values (852)18477
89.5%
ValueCountFrequency (%)
32.541
 
< 0.1%
32.553
 
< 0.1%
32.5610
 
< 0.1%
32.5718
0.1%
32.5826
0.1%
32.5911
0.1%
32.69
 
< 0.1%
32.6114
0.1%
32.6213
0.1%
32.6318
0.1%
ValueCountFrequency (%)
41.952
< 0.1%
41.921
 
< 0.1%
41.881
 
< 0.1%
41.863
< 0.1%
41.841
 
< 0.1%
41.821
 
< 0.1%
41.812
< 0.1%
41.83
< 0.1%
41.791
 
< 0.1%
41.783
< 0.1%

housing_median_age
Real number (ℝ≥0)

Distinct52
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean28.63948643
Minimum1
Maximum52
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size161.4 KiB
2021-09-01T20:49:54.945759image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile8
Q118
median29
Q337
95-th percentile52
Maximum52
Range51
Interquartile range (IQR)19

Descriptive statistics

Standard deviation12.58555761
Coefficient of variation (CV)0.4394477408
Kurtosis-0.8006288536
Mean28.63948643
Median Absolute Deviation (MAD)10
Skewness0.0603306376
Sum591119
Variance158.3962604
MonotonicityNot monotonic
2021-09-01T20:49:55.742591image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
521273
 
6.2%
36862
 
4.2%
35824
 
4.0%
16771
 
3.7%
17698
 
3.4%
34689
 
3.3%
26619
 
3.0%
33615
 
3.0%
18570
 
2.8%
25566
 
2.7%
Other values (42)13153
63.7%
ValueCountFrequency (%)
14
 
< 0.1%
258
 
0.3%
362
 
0.3%
4191
0.9%
5244
1.2%
6160
0.8%
7175
0.8%
8206
1.0%
9205
1.0%
10264
1.3%
ValueCountFrequency (%)
521273
6.2%
5148
 
0.2%
50136
 
0.7%
49134
 
0.6%
48177
 
0.9%
47198
 
1.0%
46245
 
1.2%
45294
 
1.4%
44356
 
1.7%
43353
 
1.7%

total_rooms
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct5926
Distinct (%)28.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2635.763081
Minimum2
Maximum39320
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size161.4 KiB
2021-09-01T20:49:56.508177image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum2
5-th percentile620.95
Q11447.75
median2127
Q33148
95-th percentile6213.2
Maximum39320
Range39318
Interquartile range (IQR)1700.25

Descriptive statistics

Standard deviation2181.615252
Coefficient of variation (CV)0.8276977802
Kurtosis32.630927
Mean2635.763081
Median Absolute Deviation (MAD)797
Skewness4.147343451
Sum54402150
Variance4759445.106
MonotonicityNot monotonic
2021-09-01T20:49:57.336268image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
152718
 
0.1%
158217
 
0.1%
161317
 
0.1%
212716
 
0.1%
205315
 
0.1%
160715
 
0.1%
147115
 
0.1%
171715
 
0.1%
170315
 
0.1%
172215
 
0.1%
Other values (5916)20482
99.2%
ValueCountFrequency (%)
21
 
< 0.1%
61
 
< 0.1%
81
 
< 0.1%
111
 
< 0.1%
121
 
< 0.1%
152
< 0.1%
161
 
< 0.1%
184
< 0.1%
192
< 0.1%
202
< 0.1%
ValueCountFrequency (%)
393201
< 0.1%
379371
< 0.1%
326271
< 0.1%
320541
< 0.1%
304501
< 0.1%
304051
< 0.1%
304011
< 0.1%
282581
< 0.1%
278701
< 0.1%
277001
< 0.1%

total_bedrooms
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
MISSING

Distinct1923
Distinct (%)9.4%
Missing207
Missing (%)1.0%
Infinite0
Infinite (%)0.0%
Mean537.8705525
Minimum1
Maximum6445
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size161.4 KiB
2021-09-01T20:49:58.117615image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile137
Q1296
median435
Q3647
95-th percentile1275.4
Maximum6445
Range6444
Interquartile range (IQR)351

Descriptive statistics

Standard deviation421.3850701
Coefficient of variation (CV)0.7834321252
Kurtosis21.98557506
Mean537.8705525
Median Absolute Deviation (MAD)162
Skewness3.459546332
Sum10990309
Variance177565.3773
MonotonicityNot monotonic
2021-09-01T20:49:58.898683image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
28055
 
0.3%
33151
 
0.2%
34550
 
0.2%
39349
 
0.2%
34349
 
0.2%
39448
 
0.2%
32848
 
0.2%
34848
 
0.2%
27247
 
0.2%
30947
 
0.2%
Other values (1913)19941
96.6%
(Missing)207
 
1.0%
ValueCountFrequency (%)
11
 
< 0.1%
22
 
< 0.1%
35
< 0.1%
47
< 0.1%
56
< 0.1%
65
< 0.1%
76
< 0.1%
88
< 0.1%
97
< 0.1%
108
< 0.1%
ValueCountFrequency (%)
64451
< 0.1%
62101
< 0.1%
54711
< 0.1%
54191
< 0.1%
52901
< 0.1%
50331
< 0.1%
50271
< 0.1%
49571
< 0.1%
49521
< 0.1%
48191
< 0.1%

population
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct3888
Distinct (%)18.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1425.476744
Minimum3
Maximum35682
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size161.4 KiB
2021-09-01T20:49:59.648643image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum3
5-th percentile348
Q1787
median1166
Q31725
95-th percentile3288
Maximum35682
Range35679
Interquartile range (IQR)938

Descriptive statistics

Standard deviation1132.462122
Coefficient of variation (CV)0.7944444737
Kurtosis73.55311639
Mean1425.476744
Median Absolute Deviation (MAD)440
Skewness4.935858227
Sum29421840
Variance1282470.457
MonotonicityNot monotonic
2021-09-01T20:50:00.476729image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
89125
 
0.1%
122724
 
0.1%
105224
 
0.1%
76124
 
0.1%
85024
 
0.1%
82523
 
0.1%
99922
 
0.1%
100522
 
0.1%
78222
 
0.1%
78121
 
0.1%
Other values (3878)20409
98.9%
ValueCountFrequency (%)
31
 
< 0.1%
51
 
< 0.1%
61
 
< 0.1%
84
< 0.1%
92
< 0.1%
111
 
< 0.1%
134
< 0.1%
143
< 0.1%
152
< 0.1%
172
< 0.1%
ValueCountFrequency (%)
356821
< 0.1%
285661
< 0.1%
163051
< 0.1%
161221
< 0.1%
155071
< 0.1%
150371
< 0.1%
132511
< 0.1%
128731
< 0.1%
124271
< 0.1%
122031
< 0.1%

households
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct1815
Distinct (%)8.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean499.5396802
Minimum1
Maximum6082
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size161.4 KiB
2021-09-01T20:50:01.351683image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile125
Q1280
median409
Q3605
95-th percentile1162
Maximum6082
Range6081
Interquartile range (IQR)325

Descriptive statistics

Standard deviation382.3297528
Coefficient of variation (CV)0.7653641301
Kurtosis22.05798806
Mean499.5396802
Median Absolute Deviation (MAD)151
Skewness3.410437712
Sum10310499
Variance146176.0399
MonotonicityNot monotonic
2021-09-01T20:50:02.211013image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
30657
 
0.3%
33556
 
0.3%
38656
 
0.3%
28255
 
0.3%
42954
 
0.3%
37553
 
0.3%
29751
 
0.2%
28451
 
0.2%
34050
 
0.2%
36250
 
0.2%
Other values (1805)20107
97.4%
ValueCountFrequency (%)
11
 
< 0.1%
23
 
< 0.1%
34
 
< 0.1%
44
 
< 0.1%
57
< 0.1%
65
< 0.1%
710
< 0.1%
88
< 0.1%
99
< 0.1%
107
< 0.1%
ValueCountFrequency (%)
60821
< 0.1%
53581
< 0.1%
51891
< 0.1%
50501
< 0.1%
49301
< 0.1%
48551
< 0.1%
47691
< 0.1%
46161
< 0.1%
44901
< 0.1%
43721
< 0.1%

median_income
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct12928
Distinct (%)62.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.870671003
Minimum0.4999
Maximum15.0001
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size161.4 KiB
2021-09-01T20:50:03.054722image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum0.4999
5-th percentile1.60057
Q12.5634
median3.5348
Q34.74325
95-th percentile7.300305
Maximum15.0001
Range14.5002
Interquartile range (IQR)2.17985

Descriptive statistics

Standard deviation1.899821718
Coefficient of variation (CV)0.4908249026
Kurtosis4.952524102
Mean3.870671003
Median Absolute Deviation (MAD)1.0642
Skewness1.646656702
Sum79890.6495
Variance3.60932256
MonotonicityNot monotonic
2021-09-01T20:50:03.914051image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
3.12549
 
0.2%
15.000149
 
0.2%
2.87546
 
0.2%
4.12544
 
0.2%
2.62544
 
0.2%
3.87541
 
0.2%
338
 
0.2%
3.37538
 
0.2%
3.62537
 
0.2%
437
 
0.2%
Other values (12918)20217
98.0%
ValueCountFrequency (%)
0.499912
0.1%
0.53610
< 0.1%
0.54951
 
< 0.1%
0.64331
 
< 0.1%
0.67751
 
< 0.1%
0.68251
 
< 0.1%
0.68311
 
< 0.1%
0.6961
 
< 0.1%
0.69911
 
< 0.1%
0.70071
 
< 0.1%
ValueCountFrequency (%)
15.000149
0.2%
152
 
< 0.1%
14.90091
 
< 0.1%
14.58331
 
< 0.1%
14.42191
 
< 0.1%
14.41131
 
< 0.1%
14.29591
 
< 0.1%
14.28671
 
< 0.1%
13.9471
 
< 0.1%
13.85561
 
< 0.1%

median_house_value
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct3842
Distinct (%)18.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean206855.8169
Minimum14999
Maximum500001
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size161.4 KiB
2021-09-01T20:50:04.757764image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum14999
5-th percentile66200
Q1119600
median179700
Q3264725
95-th percentile489810
Maximum500001
Range485002
Interquartile range (IQR)145125

Descriptive statistics

Standard deviation115395.6159
Coefficient of variation (CV)0.55785531
Kurtosis0.3278702429
Mean206855.8169
Median Absolute Deviation (MAD)68400
Skewness0.9777632739
Sum4269504061
Variance1.331614816 × 1010
MonotonicityNot monotonic
2021-09-01T20:50:05.585845image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
500001965
 
4.7%
137500122
 
0.6%
162500117
 
0.6%
112500103
 
0.5%
18750093
 
0.5%
22500092
 
0.4%
35000079
 
0.4%
8750078
 
0.4%
27500065
 
0.3%
15000064
 
0.3%
Other values (3832)18862
91.4%
ValueCountFrequency (%)
149994
< 0.1%
175001
 
< 0.1%
225004
< 0.1%
250001
 
< 0.1%
266001
 
< 0.1%
269001
 
< 0.1%
275001
 
< 0.1%
283001
 
< 0.1%
300002
< 0.1%
325004
< 0.1%
ValueCountFrequency (%)
500001965
4.7%
50000027
 
0.1%
4991001
 
< 0.1%
4990001
 
< 0.1%
4988001
 
< 0.1%
4987001
 
< 0.1%
4986001
 
< 0.1%
4984001
 
< 0.1%
4976001
 
< 0.1%
4974001
 
< 0.1%

ocean_proximity
Categorical

HIGH CORRELATION

Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size161.4 KiB
<1H OCEAN
9136 
INLAND
6551 
NEAR OCEAN
2658 
NEAR BAY
2290 
ISLAND
 
5

Length

Max length10
Median length9
Mean length8.064922481
Min length6

Characters and Unicode

Total characters166460
Distinct characters16
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNEAR BAY
2nd rowNEAR BAY
3rd rowNEAR BAY
4th rowNEAR BAY
5th rowNEAR BAY

Common Values

ValueCountFrequency (%)
<1H OCEAN9136
44.3%
INLAND6551
31.7%
NEAR OCEAN2658
 
12.9%
NEAR BAY2290
 
11.1%
ISLAND5
 
< 0.1%

Length

2021-09-01T20:50:07.554678image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-09-01T20:50:08.132589image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
ValueCountFrequency (%)
ocean11794
34.0%
1h9136
26.3%
inland6551
18.9%
near4948
14.2%
bay2290
 
6.6%
island5
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
N29849
17.9%
A25588
15.4%
E16742
10.1%
14084
8.5%
O11794
 
7.1%
C11794
 
7.1%
<9136
 
5.5%
19136
 
5.5%
H9136
 
5.5%
I6556
 
3.9%
Other values (6)22645
13.6%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter134104
80.6%
Space Separator14084
 
8.5%
Math Symbol9136
 
5.5%
Decimal Number9136
 
5.5%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
N29849
22.3%
A25588
19.1%
E16742
12.5%
O11794
 
8.8%
C11794
 
8.8%
H9136
 
6.8%
I6556
 
4.9%
L6556
 
4.9%
D6556
 
4.9%
R4948
 
3.7%
Other values (3)4585
 
3.4%
Space Separator
ValueCountFrequency (%)
14084
100.0%
Math Symbol
ValueCountFrequency (%)
<9136
100.0%
Decimal Number
ValueCountFrequency (%)
19136
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin134104
80.6%
Common32356
 
19.4%

Most frequent character per script

Latin
ValueCountFrequency (%)
N29849
22.3%
A25588
19.1%
E16742
12.5%
O11794
 
8.8%
C11794
 
8.8%
H9136
 
6.8%
I6556
 
4.9%
L6556
 
4.9%
D6556
 
4.9%
R4948
 
3.7%
Other values (3)4585
 
3.4%
Common
ValueCountFrequency (%)
14084
43.5%
<9136
28.2%
19136
28.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII166460
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
N29849
17.9%
A25588
15.4%
E16742
10.1%
14084
8.5%
O11794
 
7.1%
C11794
 
7.1%
<9136
 
5.5%
19136
 
5.5%
H9136
 
5.5%
I6556
 
3.9%
Other values (6)22645
13.6%

Interactions

2021-09-01T20:48:44.177778image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-09-01T20:48:45.568330image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-09-01T20:48:46.458910image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-09-01T20:48:47.771350image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-09-01T20:48:48.490057image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-09-01T20:48:49.411891image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-09-01T20:48:50.271222image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-09-01T20:48:51.427406image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-09-01T20:48:52.114875image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-09-01T20:48:52.817964image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-09-01T20:48:53.646049image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-09-01T20:48:54.411630image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-09-01T20:48:55.458452image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-09-01T20:48:56.145916image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-09-01T20:48:56.786518image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-09-01T20:48:57.520846image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-09-01T20:48:58.208311image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-09-01T20:48:58.817838image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-09-01T20:48:59.442626image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-09-01T20:49:00.083218image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-09-01T20:49:00.801929image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-09-01T20:49:01.458149image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-09-01T20:49:02.223732image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-09-01T20:49:03.051814image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-09-01T20:49:03.801780image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-09-01T20:49:04.629992image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-09-01T20:49:05.301703image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-09-01T20:49:05.958047image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-09-01T20:49:06.629760image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-09-01T20:49:07.285974image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-09-01T20:49:07.957819image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-09-01T20:49:08.645284image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-09-01T20:49:09.332747image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-09-01T20:49:10.051466image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-09-01T20:49:10.723299image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-09-01T20:49:11.426392image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-09-01T20:49:12.129480image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-09-01T20:49:12.832569image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-09-01T20:49:13.551282image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-09-01T20:49:14.238750image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-09-01T20:49:14.973088image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-09-01T20:49:15.660553image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-09-01T20:49:16.426146image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-09-01T20:49:17.301093image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-09-01T20:49:18.019803image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-09-01T20:49:18.754141image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-09-01T20:49:19.410359image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-09-01T20:49:20.113447image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-09-01T20:49:20.910283image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-09-01T20:49:21.941486image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-09-01T20:49:22.800809image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-09-01T20:49:23.503899image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-09-01T20:49:24.253862image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-09-01T20:49:24.956951image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-09-01T20:49:25.675662image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-09-01T20:49:26.363129image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-09-01T20:49:27.050592image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-09-01T20:49:27.753819image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-09-01T20:49:28.472395image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-09-01T20:49:29.311677image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-09-01T20:49:30.024267image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-09-01T20:49:30.734502image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-09-01T20:49:31.384458image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-09-01T20:49:32.103179image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-09-01T20:49:32.806264image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-09-01T20:49:33.509349image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-09-01T20:49:34.181192image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-09-01T20:49:34.868655image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-09-01T20:49:35.540498image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-09-01T20:49:36.290457image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-09-01T20:49:36.977925image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-09-01T20:49:37.696636image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-09-01T20:49:38.368476image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-09-01T20:49:39.134065image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-09-01T20:49:39.805904image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-09-01T20:49:40.508992image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-09-01T20:49:41.321458image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-09-01T20:49:41.993294image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-09-01T20:49:42.696383image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-09-01T20:49:43.727581image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-09-01T20:49:44.571285image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Correlations

2021-09-01T20:50:08.679579image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2021-09-01T20:50:09.632512image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2021-09-01T20:50:10.444968image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2021-09-01T20:50:11.288675image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2021-09-01T20:49:45.993089image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
A simple visualization of nullity by column.
2021-09-01T20:49:47.211776image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2021-09-01T20:49:49.117927image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.

Sample

First rows

longitudelatitudehousing_median_agetotal_roomstotal_bedroomspopulationhouseholdsmedian_incomemedian_house_valueocean_proximity
0-122.2337.8841880129.03221268.3252452600NEAR BAY
1-122.2237.862170991106.0240111388.3014358500NEAR BAY
2-122.2437.85521467190.04961777.2574352100NEAR BAY
3-122.2537.85521274235.05582195.6431341300NEAR BAY
4-122.2537.85521627280.05652593.8462342200NEAR BAY
5-122.2537.8552919213.04131934.0368269700NEAR BAY
6-122.2537.84522535489.010945143.6591299200NEAR BAY
7-122.2537.84523104687.011576473.1200241400NEAR BAY
8-122.2637.84422555665.012065952.0804226700NEAR BAY
9-122.2537.84523549707.015517143.6912261100NEAR BAY

Last rows

longitudelatitudehousing_median_agetotal_roomstotal_bedroomspopulationhouseholdsmedian_incomemedian_house_valueocean_proximity
20630-121.3239.29112640505.012574453.5673112000INLAND
20631-121.4039.33152655493.012004323.5179107200INLAND
20632-121.4539.26152319416.010473853.1250115600INLAND
20633-121.5339.19272080412.010823822.549598300INLAND
20634-121.5639.27282332395.010413443.7125116800INLAND
20635-121.0939.48251665374.08453301.560378100INLAND
20636-121.2139.4918697150.03561142.556877100INLAND
20637-121.2239.43172254485.010074331.700092300INLAND
20638-121.3239.43181860409.07413491.867284700INLAND
20639-121.2439.37162785616.013875302.388689400INLAND